DOCUMENT RESUME 



ED 445 082 



TM 031 759 



AUTHOR 

TITLE 



,PUB DATE 
NOTE 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 
IDENTIFIERS 



Alexander, Erika D. 

Using Canonical Correlation To Explore Relationships between 
Sets of Variables: An Applied Example with Interpretive 
Suggestions . 

2000 - 01-00 

16p. ; Paper presented at the Annual Meeting of the Southwest 
Educational Research Association (Dallas, TX, January 27-29, 
2000 ) . 

Reports - Descriptive (141) -- Speeches/Meeting Papers (150) 
MFOl/PCOl Plus Postage. 

Correlation; ^Multivariate Analysis 
♦Variables ( Ma t hema t i c s ) 



ABSTRACT 



Canonical correlation analysis is a parsimonious way of 
breaking down the association between two sets of variables through the use 
of linear combinations. As a result of the analysis, many types of 
coefficients can be generated and interpreted. These coefficients are only 
considered stable and reliable if the number of subjects per variable is 
sufficiently large. The first of these coefficients, the canonical 
correlation, is the bivariate correlation between the composite scores for 
the two sets of variables. Two additional coefficients, the canonical 
function and structure coefficients, address the contribution a single 
variable makes to the explanatory power of the set of variables to which the 
variable belongs. The communality coefficient explains how useful the 
variable is in defining the canonical solution. The adequacy coefficient 
indicates how adequately the analysis represents the total variance in the 
unweighted set. The extent to which a variable contributes to explaining the 
composite of the variable set to which the variable of interest does not 
belong is the index coefficient. A final outcome from canonical correlation 
analysis is the redundancy coefficient, which indicates the average 
proportion of variance for variables in one set that is reproducible with the 
variables in the other set. While the coefficient is easy to calculate, it is 
not recommended for interpretation in most cases. (Contains 3 tables and 10 
references . ) (SLD) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 



TM031759 



(N 

00 

o 



Q 



W 



USING CANONICAL CORRELATION TO EXPLORE RELATIONSHIPS 
BETWEEN SETS OF VARIABLES: AN APPLIED EXAMPLE WITH 
INTERPRETIVE SUGGESTIONS 



Erika D. Alexander 
University of North Texas 



U S DEPARTMENT OF EDUCATION 
Oftice of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
J CENTER (ERIC) 

C This document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 

imnrnvn reoroduction Qualitv. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL HAS 
BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



Paper presented at the annual meeting of the Southwest Educational Research 
Association, Dallas, Texas, January 27-29, 2000. 



8EST COPY AVAILABLE 



Introduction 



Canonical correlation analysis is a procedure for exploring the relationship 
between two sets of variables containing two or more variables each. As argued by 
Baggaley (1981), the multivariate technique is the most general case of the general linear 
model. It is usually employed because the researcher wants to consider the simultaneous 
workings of all the variables of interest at once. As noted by Thompson (1984), 
canonical correlation analysis is appealing because although multivariate in nature, it can 
be presented in bivariate terms. This is achieved by the calculation of many different 
coefficients and correlations, each of which answer research questions detailing different 
aspects of the analysis. This paper presents seven different coefficients that are generated 
as a result of canonical correlation analysis. Each coefficient is described and interpreted 
using a practical example. 

Data Example 

To illustrate a practical application of canonical correlation, data from a large 
West coast law school were used to determine the relationship between two sets of 
variables and their influence on admitted students’ decisions whether or not to attend the 
law school. Eight variables were analyzed using SAS statistical software for the dataset 
containing 251 observations. The predictor set contained five variables that were 
measurements of the influence of direct contact of the law school with the admitted 
students: the admission bulletin, direct mail marketing pieces, official law school forum, 
law school representative, and visits to the campus. The criterion set contained three 




3 



variables that were measurements of outside influences, or influences on the admitted 
students that are beyond the control of the law school: undergraduate pre-law advisor, 
attorneys, and parental influences. 

According the Barcikowski and Steven’s (1975) Monte Carlo study on the 
stability of the canonical coefficients and correlations, the number of subjects per 
variable required to achieve rehable results in interpreting the largest canonical 
correlation should be at least 20/1. When considering the two largest canonical 
correlations, a ratio ranging from 42/1 to 68/1 should be considered. In this example, the 
ratio of 31/1 used in this analysis is sufficient to achieve stable and reliable coefficients. 

Analysis Results 

To obtain an understanding of how well the variables are related to one another, 
Pearson correlations were calculated among each pair of variables, both within and across 
sets (see Table 1). The degree to which two predictors correlate is the degree to which 
they are said to be collinear. The collinearities among the direct contact measurements 
revealed some moderate values--the largest between FORUM and REPRESENTATIVE, 
and between ATTORNEY and PARENTS. When considering the between correlations, 
one notices that ADVISOR is moderately correlated with three of the five direct Contact 
measures and that REPRESENTATIVE is appreciably correlated with each of the outside 
influence measures. 

In canonical correlation analysis, an important measure to consult is the canonical 
correlation coefficient (Rc). According to Thompson (1984), conventional canonical 
correlation analysis initially begins by "collapsing each person’s scores on the variables in 




4 



each variable set into a single composite variable." The bivariate correlation between the 
composite scores for the two sets of variables is the canonical correlation. As explained 
by Tatsuoka (1971), the total number of possible canonical correlations is equal to 
imn(p,q) where p is the number of variables in the first set and q is the number of 
variables in the second set. Therefore, in this example, there are three [min(5,3) = 3] 
canonical correlations yielded by the analysis. 

As seen in Table 2, the two largest canonical correlations, Rcj=0.65 and Rc2=021, 
are both statistically significant at the 0.05 level. One also notices that /?; is larger than 
any of the between-set correlations. According to Stevens (1996), even though a 
canonical correlation can be found to be statistically significant, a weak canonical 
correlation (/?c<0.30, Rc^<0.09) may be trivial and of little practical value. Therefore, the 
researcher may decide a trivial canonical function is not worth interpreting. Because 
there is such a large decrease in value between Ri and R 2 and because R 2 =0.07 is small, 
only the first canonical function will be interpreted. Results from all three functions are 
presented in Table 3. 

Result Interpretation 

Once a canonical function is identified for interpretation, a number of coefficients 
may be calculated and consulted to answer various research questions (Thompson, 1984). 
Of interest to researchers is the contribution a single variable makes to the explanatory 
power of the set of variables to which the variable belongs. Two coefficients that address 
this question are the canonical function and structure coefficients. Similar to beta 
weights in regression, standardized function coefficients are weights applied to the 




5 



standardized data, which is summed to create the synthetic variables, or canonical 
variates (Thompson, 1991). When observing the standardized function coefficients for 
the direct contact measurements in the first function, one notices that FORUM and 
REPRESENTATIVE appear to be making the largest contribution, with the other three 
variables making contributions that are small and similar in size. For the outside 
influence variables, ADVISOR is making over twice the contribution as either of the 
other two variables. 

As Kerlinger & Pedhazur (1973), Levine (1977), and Meredith (1964) argue, it is 
important to interpret canonical results based on not only function coefficients, but on 
structure coefficients as well. Structure coefficients are the bivariate correlations 
between the predictor variables and the synthetic variable created by the linear 
combinations, and generally take into account the coUinearity, or overlap, of the set of 
variables. In this example, function and structure coefficients yield similar results. When 
observing the standardized structure coefficients for the direct contact measurements in 
the first function, FORUM and REPRESENTATIVE are making the largest contribution, 
with the other three variables making contributions that are smaller and similar in size. 
For the outside influence variables, ADVISOR is making the largest contribution. To 
obtain an estimation of the proportion of variance a variable shares with its canonical 
composite, the structure coefficient is squared. According to Table 3, FORUM and 
REPRESENTATIVE account for 80% and 74% of the direct contact variate, 
respectively, with BULLETIN, MAIL, and CAMPUS accounting for much smaller 
proportions of the variate. For the outside influence variable set, ADVISOR accounts for 
78% of the variate, while ATTORNEY and PARENTS each account for less than 40%. 



O 

ERIC 



6 



By summing the squared structure coefficients either across the functions or 
across the variables within a given function, one obtains the next two coefficients of 
interest: communality and adequacy. The communality coefficient for a variable 
(represented by h^) equals the sum of the squared structure coefficients for all the 
functions and is an indication of what proportion of the variable’s variance is 
reproducible. In other words, how useful the variable was in defining the canonical 
solution (Thompson, 1984). As seen in Table 3, the communality coefficients indicate 
that the researcher is not getting as much from the BULLETIN and MAIL variables as 
from the FORUM, REPRESENTATIVE, and CAMPUS variables. 

The adequacy coefficient for a given function is the average of the squared 
structure coefficients for all the variables in the set and indicates how adequately the 
analysis represents the total variance in the unweighted set. In this example, the first 
function has a much larger adequacy coefficient than the other two functions, although 
the difference is more sizable for the set of variables measuring the direct contact 
methods. 

Also of interest to the researcher is the relationship between the individual 
variables in one variable set with the canonical variates in the other variable set. In other 
words, what is the extent to which a variable contributes to explaining the composite, or 
linear combination of the variable set to which the variable of interest does not belong? 
The coefficient that addresses this question is referred to as an index coefficient. An 
index coefficient is the correlation between an unweighted variable in one set and the 
weighted and aggregated variables in the other set (Thompson, 1984). As seen in Table 
3, ADVISOR has the largest index coefficient in the set of direct contact measurements. 



O 

ERIC 



7 



and FORUM and REPRESENTATIVE have the largest index coefficients in the set of 
outside influence measurements. 

The final component of canonical correlation analysis is the computation of 
redundancy coefficients. For a variable set on a function, a redundancy coefficient (Rd) 
is computed by multiplying the adequacy coefficient for the set by for the function. It 

indicates the average proportion of variance for variables in one set that is reproducible 

) 

with (e.g., redundant with) the variables in the other set. Table 3 shows the Rd for each 
function. 

It is often argued that redundancy coefficients should only be interpreted in the 
"few concurrent vahdity applications in which both variable sets consist of the same 
variables" (Thompson, 1991, p.89). Cramer and Nicewander (1979) argued that 
redundancy coefficients are not tmly multivariate "in the strict sense because it is 
unaffected by the intercorrelaions of the variables being predicted. The redundancy 
index is only multivariate in the sense that it involves several criterion variables." (p. 43) 
Therefore, for the heuristic purposes of this paper, Rd values were computed and 
presented; however no interpretations or conclusions will be drawn considering that the 
research situation from which the data were drawn does not fit the application of 
redundancy coefficients suggested by Thompson (1991). 

Summary 

Canonical correlation analysis is a "parsimonious way of breaking down the 
association between two sets of variables through the use of linear combinations" 
(Stevens, 1986). As a result of the analysis, many types of coefficients can be generated 



O 

ERIC 



8 



and interpretted. These coefficients are only considered stable and reliable if the number 
of subjects per variable is sufficiently large. 

The first of these coefficients, the canonical correlation, is the bivariate 
correlation between the composite scores for the two sets of variables. Two additional 
coefficients, the canonical function and structure coefficients, address the contribution a 
single variable makes to the explanatory power of the set of variables to which the 
variable belongs. The communahty coefficient explains how useful the variable is in 
defining the canonical solution. The adequacy coefficient indicates how adequately the 
analysis represents the total variance in the unweighted set. The extent to which a 
variable contributes to explaining the composite of the variable set to which the variable 
of interest does not belong is the index coefficient. A final outcome from canonical 
correlation analysis is the redundancy coefficient, which indicates the average proportion 
of variance for variables in one set that is reproducible with the variables in the other set. 
While the coefficient is easy to calculate, it is not recommended for interpretation in most 
cases. 



ERIC 



9 



REFERENCES 



Baggaley, A.R. (1981). Multivariate analysis: An introduction for consumers of 
behavioral research. Evaluation Review, 5, 123-131. 

Barcikowski, R., & Stevens, J. P. (1975). A Monte Carlo study of the stabihty of 

canonical correlations, canonical weights and canonical variate- variable correlations. 
Multivariate Behavioral Research, 10, 353-364. 

Cramer, E.M., & Nicewander, W.A. (1979). Some symmetric, invariant measures of 
multivariate association. Psychometrika, 44, 43-54. 

Kerhnger, F.N., & Pedhazur, E.J. (1973). Multiple regression in behavioral research. 
New York: Holt, Rinehart and Winston. 

Levine, M.S. (1977). Canonical analysis and factor comparison. Newbury Park: Sage. 
Meredith, W. (1964). Canonical correlations with fallible data. Psychometrika, 29, 55-65. 
Stevens, J. (1996). Applied multivariate statistics for the social sciences (3^** ed.). 
Mahwah, NJ: Erlbaum. 

Tatsuoka, M. M. (1971). Multivariate analysis: Techniques for educational and 
psychological research. New York: Wiley. 

Thompson, B. (1984). Canonical correlation analysis: Uses and interpretation. 

Newbury Park: Sage. 

Thompson, B. (1991). A primer on the logic and use of canonical correlation analysis. 
Measurement and Evaluation in Counseling and Development, 24, 80-95. 



CM 



CO 

3 

a 

E 

(0 

o 



o 

o 

o 

o 



ERIC 



m 

m 

< 

K- 



§ 



0 > 

CO 

0 > 

w 

a 

o> 

cc 



E 

3 

w 

O 



O 00 
O CNJ 
O 'I- 
O CO 
T-' d 



o o o 
O N. o 

O CO O) 
O CD -r- 

d d 



CO 

0) 

k. 

3 

(0 

(0 

0) 



o 

(0 

c 

o 

o 

o 

0) 

k. 

□ 

0) 

£ 

D) 

C 

o 

E 

< 

(0 

c 

0 

1 
0) 
ki 
ki 

o 

o 



CO 

CD 

k. 

3 

CO 

<Q 

CD 



CD 



CO 



CO 

0. 



CO 

S 



o 


CNJ 


CO 


CO 


o 


0> 


o 


o 


o 


O 


CNJ 


CNJ 


c 


c 


o 




o 


a> 


CO 


a> 


CD 


k 


o 


T— 


q 


CO 


CO 




3 


o 


q 


to 




d 


d 


d 


C 


V 

< 




d 



0) 

"O 

‘3 



CO 

CD 

ki 

3 

CO 

<0 

CD 



CD 

O 

c 

CD 

3 



o 


c 


CO 






00 


CNJ 


o 




c 


CO 


C3> 


to 


o 


o 


0> 


0> 


CD 


h- 


00 


to 


o 


ll 


T“ 




CNJ 


CO 




"O 

‘3 


CO 

a 


d 


d 


d 


d 



3 

o 

"O 

c 

ca 

4-^ 

o 

ca 

4-^ 

c 

o 

o 

4-^ 

o 

0) 



O CO CO N- CO 
C ^ CD T- ^ 
k CO CO o> 
O ^ ^ d CO 

5 d d d d 



.E o o 
tK o CO 

O O CO 

— o 



CD 



’I- o 



to 


Tl* 


to 


o 


k 

o 


O 




CO 




k 

o 


CO 




CO 




a> 






CO 


o 


CO 


CO 


u 




a> 


o 


CO 




CNJ 


Jd 


CD 




o 


CNJ 


CO 


12 




00 






q 


CNJ 


CNj 


£ 


> 


q 


CNJ 


CNJ 


4-^ 


> 

*n 


CNJ 


CO 


to 


d 


d 


d 


4-^ 

O) 


w 

< 




d 


d 


c 

0) 


< 


d 


d 


d 



to 



0) 

> 



0> 

CO 



CO 



PM* w# 

m E $ Q. 

0 ) _ 3 w c 

= .*= S Q. E 

3 a o a 

CD S u. a: o 



c 

o 

E 

< 

CO 

c 

o 

‘i 3 

iS 

0) 

ki 

ki 

o 

o 



S E C 

1 I 2 

TO « CO 

< < o. 



a 

0) 

CD 

CO 

c 

o 

iS 

0) 

ki 

ki 

o 

O 



o 

> 



f- ^ 

E c ^ 
■S = S 
- -S 2 CL 
3 <5 O O 
m S u. Qc 



Campus 0.1350 0.2817 0.3018 



TABLE 2 






LL 

L 

o. 



O 

oc 



CO 

c 

_o 

‘i^ 

i5 

CD 



o 

oc 



1- CM 
O 1- ’M- 
O -r- CO 
O O CO 

d d d 



in T- lo 

CO -M- CO 

T- o 

’M- O O 

d d d 



•M- CO CM 
in CM 05 
lO 

CO CM O 

d d d 



CO 



o 

ERIC 



O 

o 

75 

o 

"E 

o 

c 

CO 

O 



c 

o 

t> T- CM CO 

c 

3 

U. 



TABLE 3 




CO 



in 



BEST COPY AVAILABLE 



® 




U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 
Educational Resources Information Center (ERIC) 



REPRODUCTION RELEASE 

(Specific Document) 




TM031759 






I. DOCUMENT IDENTIFICATION: 



Title: "Be:^oj>eeiO 

X ^ 

Author(s): ■ A V£< 



Corporate Source: 



Publication Date: 



II. REPRODUCTION RELEASE: 



In order to disseminate as widely as possible timely and significant materials of interest to the educational community, documents announced in the 
monthly abstract journal of the ERIC system, Resources in Educetion (RIE), are usually made available to users in microfiche, reproduced paper copy, 
and electronic media, and sold through the ERIC Document Reproduction Service (EDRS). Credit is given to the source of each document, and, if 
reproduction release is granted, one of the following notices is affixed to the document. 



If permission is granted to reproduce and disseminate the kJentrfied document, please CHECK ONE of the following three options and sign at the bottom 
of the page. 




LeveM ' Level 2A Level 2B 



[k □ 




Chock hero for Level 1 releaee. permitting reproduction Check here fc^evel 2A roleaM, penrttting reproduction 

and dissemination in nticrofiche or other ERIC archival arvi dissemiristion in microfiche and In electionic media 

media (e.g.. eloctronic) and papw copy. for ERIC archival collection subschbers only 



Check here for Level 26 release, permitting 
reproduction and dissemination in microfiche only 



Documents will be processed as Irtdicated provided reproduction quality permits. 

If permission to reproduce is granted, but no box Is checked, documents will be processed at Level 1. 



Sign 

ERIO® 



/ hereby grant to the Educational Resources Information Center (ERIC) nonexclusive permission to reproduce and disseminate this document 
as indicated above. Reproduction from the ERIC microfiche or electronic media by persons other than ERIC amployees and its system 
contractors requires permission from the copyright holder. Exception is made for norhprofit reproduction by libraries and other service agencies 
to satisfy information needs of educators in response to discrete inquiries. 




Printed Name/PosHiorVrfUe: 


Organization/Addrets: 


390 /t7 / 


FAX; 








III. DOCUMENT AVAILABILITY INFORMATION (FROM NON-ERIC SOURCE): 

If permission to reproduce is not granted to ERIC, or, if you wish ERIC to cite the availability of the document from another source, please 
provide the following information regarding the availability of the document. (ERIC will not announce a document unless it is publicly 
available, and a dependable source can be specified. Contributors should also be aware that ERIC selection criteria are signriicantty more 
stringent for documents that cannot be made available through EDRS.) 



Publisher/Distributor 


■' 






Address: 








Price: 









IV. REFERRAL OF ERIC TO COPYRIGHT/REPRODUCTION RIGHTS HOLDER: 

If the right to grant this reproduction release is held by someone other than the addressee, please provide the appropriate name and 
address: 




V. WHERE TO SEND THIS FORM: 



Send this form to the following ERIC Clearinghouse: 



t 

g 



However, if solicited by the ERIC Facility, or if making an unsolicited contribution to ERIC, return this form (and the document being 
contributed) to: 

ERIC Processing and Reference Facility 
1 1 00 West Street. 2"^ Floor 
Laurel, Maryland 20707-3598 




Telephone: 301-497-4080 
Toll Free: 800-799-3742 
FAX: 301-953-0263 
e-mall: ericfac@ineted.gov 
WWW: http://ericfac.plccard.csc.com 



